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(54) Voice control input for portable capture devices 

(57) A portable capture device (100), such as a 
hand held document scanner or digital camera, that 
receives voice commands for operation control is dis- 
closed. Commands such as "scan*, "save", "delete", 
left", "zoom in", and "send" are illustrative, where the 
capture device (100) will perform the predetermined 
functions associated with the command names. In one 
embodiment, the portable capture device (100) trains 
itself to recognize the user's spoken commands through 
voice analysis software (124). The voice analysis soft- 
ware (124) may be located within the capture device 
(100), or on a host computer system (200) and 
accessed by the Capture device (100) while tethered to 
the host computer system (200). The capture device 
(100) has an audio input/output system under the con- 
trol of a controller (106). Upon receiving a voice control 
input command, the controller (106) saves the digitized 
voice input in dynamic memory (118). The controller 
(106) then compares the command received with the 
commands stored in a command recognition table (126) 
held in static memory (116). 
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Description DISCLOSURE OF THE INVENTION 



TECHNICAL FIELD 

[0001] This invention relates to portable capture 
devices such as hand held document scanners or digital 
cameras. Even more particularly, the invention relates 
to voice control input for portable hand held document 
scanners or digital cameras. 

BACKGROUND OF THE INVENTION 

[0002] Portable capture devices, such as hand held 
document scanners or digital cameras, have proven to 
be very useful tools in certain situations. Their portabil- 
ity and ease in capturing and saving information from 
various locations away from a user's office or work place 
are the primary benefits of such capture devices. 
[0003] Though such portable capture devices are 
small, reducing their size to be even smaller and more 
portable is desirable. However, further reductions in size 
are fairly limited by the current physical user interface 
requirements. Most portable hand held document scan- 
ners, for example, have anywhere from ten to fifteen 
user inpui buttons to aiiow the user to control a number 
of different operations. Such operations include: start 
and stop scanning; save and delete scanned informa- 
tion; send scanned information; and view, zoom, and 
pan scanned data on the scanner display. The buttons 
must be large enough and adequately spaced to allow a 
user to easily control and press the buttons. The buttons 
must also be placed in such a fashion that the portable 
scanner device can be handled by the user without 
pressing buttons to activate various functions not 
intended to be activated in the normal transport and 
handling of the capture device, and while using the port- 
able scanner device to scan a document Buttons some- 
times must be used in combination, making the scanner 
device somewhat awkward to use. Due to the physical 
space occupied by the user input buttons, the output 
display on such capture devices is often quite small by 
necessity, making use of the display less functional than 
desired. The same can be said for portable digital cam- 
eras. 

[0004] It is thus apparent that there is a need in the 
art for an improved method or apparatus which will 
reduce the number of user input buttons required to 
operate the portable capture device and at the same 
time reduce the complexity of the user interface. There 
is also a need in the art to further reduce the size of 
portable capture devices to further increase their porta- 
bility and ease of use. A further need in the art is to uti- 
lize a larger, more readable display in portable capture 
devices while maintaining a reduced overall size for the 
portable capture device. The present invention meets 
these and other'needs in the art. 



[0005] It is an aspect of the present invention to uti- 
lize user voice input to control the operation of a porta- 
5 ble capture device, such as a hand held document 
scanner or digital camera. 

[0006] It is another aspect of the invention to reduce 
the number of user input buttons on a portable capture 
device. 

10 [0007] Yet another aspect of the invention is to 
reduce the overall size of a portable capture device 
through the elimination of a number of user input but- 
tons. 

[0008] Stilt another aspect of the invention is to 
is increase the output display area of a portable capture 

device while decreasing the overall size of the portable 

capture device by utilizing some of the physical space 

formerly occupied by a number of user input buttons 

that have been eliminated. 
20 [0009] A further aspect of the invention is to key the 

operation of a portable capture device to an audible 

password spoken by a user. 

[0010] A still further aspect of the invention is to 
tether a portable capture device to a host computer to 
25 train the portable capture device to recognize a user's 
voice control input commands. 
[0011] Another further aspect of the invention in 
another embodiment is to utilize a limited voice control 
input command set in a portable capture device that 
30 does not require training by a host computer 

[0012] The above and other aspects of the inven- 
tion are accomplished in a portable capture device that 
receives voice control input commands to control its 
operation. To initiate an action with a portable capture 
35 device., such as a scan with a portable hand held docu- 
ment scanner, the user powers on the capture device 
and then inputs the voice control input command 
"scan", which is picked up by the capture device through 
a voice pickup component located in the capture device. 
40 Upon recognizing the command "scan", the capture 
device will wait a predetermined amount of time, usually 
a few seconds, for the user to position the capture 
device on a document After the time delay, the capture 
device is ready to scan, which is indicated to the user by 
45 an audible beep or audible repeat of the word "scan". 
The user then moves the portable hand held document 
scanner across the surface of the document. Upon 
detecting lack of movement for a predetermined period 
of time, the portable hand held document scanner will 
so once again beep or output another audible word such 
as "done" or "stop" to indicate to the user that the cap- 
ture device believes it should no longer be in scan 
mode. If the capture device detects no further move- 
ment within a predetermined amount of time from the 
55 beep or audible word output, usually a few seconds, the 
portable hand held document scanner leaves the scan 
mode and begins processing the scan data for output to 
the user on the portable hand held document scanner 
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display. In an alternative embodiment of the invention, 
the user pushes a button on the portable hand held doc* 
ument scanner to stop the scan mode. The portable 
hand held document scanner then processes the scan 
data for output to the user. 

[0013] Once the image is output to the display, the 
user can issue a voice control input command to "save" 
or "delete" the scanned image. The user may also view 
different parts of the image by issuing voice control 
input commands such as "zoom in", "zoom out", "left", 
"right", "up", or "down". The user may also transfer a 
scanned image, or several images, to a host computer 
through an established connection by issuing voice con- 
trol input commands such as "send" or "send all". Once 
the capture device recognizes the command, it per- 
forms the desired operation. If the capture device proc- 
esses a voice control input command and finds no 
match, an indication of no match, such as an audible 
word or a beep pattern, is output to the user. The cap- 
ture device then waits to receive the next voice control 
input command. 

[0014] Voice control input allows a means for the 
capture device to be "keyed" to a particular user through 
the use of a spoken password. Once the capture device 
is powered on, it will not function until the proper pass- 
word is received and processed. This would prevent 
anyone except the user from using the capture device 
as long as the user's password is not overheard. 
[001 5] In one embodiment of the invention a porta- 
ble capture device, such as a portable hand held docu- 
ment scanner, is trained to recognize the user's spoken 
voice control input commands through voice analysis 
software. The voice analysis software may be located 
within the capture device, or located in a host computer 
system and accessed by the capture device while teth- 
ered to the host computer system. In the preferred 
embodiment of the invention, the tethered mode is used 
to take advantage of the greater computing power avail- 
able in the host computer system and to reduce the 
complexity of the capture device. 
[001 6] For example, in using the voice analysis soft- 
ware in the training mode, the user would be given a 
predetermined list of the functions that can be executed 
by the capture device with a voice control input com- 
mand. Command one, for example, may represent a set 
of instructions for performing a scan function of a docu- 
ment or image. In selecting command one for training 
and analysis, the user would be prompted by the voice 
analysis software to choose a word that the user wants 
to use to invoke the set of instructions for the scan func- 
tion. The user would then be prompted to repeat the 
chosen word a number of times. A logical choice would 
be to choose the word "scan", but any word chosen by 
the user could be used. Each repetition of the word 
"scan" is picked up by the capture device and analyzed 
by the voice analysis software to develop a recognition 
pattern to encompass the variations and inflections in 
the uses voice in issuing the "scan" command. The rec- 



ognition patterns for all the words chosen by the user to 
invoke the various functions are stored in a static mem- 
ory in the capture device in a command recognition 
table. The recognition patterns in the command recogni- 

5 tion table are each linked to the predetermined sets of 
instructions for the various functions, which are also 
stored in the static memory. Thus, when the spoken 
voice control input command word is received and rec- 
ognized by the capture device, the set of instructions 

w associated with that command word are executed. This 
embodiment is language independent, enabling foreign 
languages to be utilized for the voice control input com- 
mand words, since the set of instructions for a function 
are tied to the user's word choice and subsequent train- 
is ing and voice analysis of that word choice. 

[0017] In another embodiment of the invention, 
there is no word choice given the user for training and 
voice analysis. The recognition patterns in the com- 
mand recognition table are predetermined and tied to 

20 specific words the user must use. The user would have 
to modify his or her pronunciation of the command word 
"scan", for example, until the capture device recognized 
the command as spoken by the user. Thus, in this 
embodiment, the device would be primarily directed to a 

25 particular language where the command words were 
indicative of the resulting actions. Foreign language ver- 
sions of the device could be made for users utilizing for- 
eign words indicative of the resulting actions. 
[0018] The portable capture device has a voice 

so audio input/output system under the control of a control- 
ler. Upon receiving a voice control input command, the 
controller saves the digitized voice input in dynamic 
memory The controller then processes the command 
and compares the recognition pattern for the command 

35 with the recognition patterns stored in the command 
recognition table held in static memory. When a match 
is found, execution of the set of instructions tied to the 
recognition pattern begins. The set of instructions for a 
particular command may include acknowledging the 

40 command back to the user by outputting an audible 
beep, audible playback of the command name, or illumi- 
nating a light emitting diode (LED). Particular com- 
mands may also have one or more time delays built into 
the set of instructions to allow time for the user to phys- 

45 ically manipulate the capture device or to cancel the 
command. If the user has changed his mind about the 
command just issued, or, if the capture device inter- 
preted the command incorrectly, the user can cancel the 
command before it is executed through a cancel or clear 

so button on the capture device or through a voice control 
input command that cancels the previous command 
received. Otherwise, if no input is received to cancel the 
command, the set of instructions for the command are 
executed. 

55 [001 9] For portable capture devices that allow voice 
annotation of captured image data files, such as with a 
digital camera, or document data files, such as with a 
portable scanner, the capture device distinguishes a 
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voice control input command from a voice annotation. In 
one embodiment of the invention, a voice control input 
annotation command is used to prepare the capture 
device to accept the immediately following voice input 
as a voice annotation to the current image data file or 
document data file. A predetermined length of time of 
silence without voice input serves as the indication that 
the voice annotation is complete. In another embodi- 
ment of the invention, a use paradigm similar to a record 
button on a tape recorder is utilized. A button on the 
capture device is pressed and held down to signify that 
the following voice input is for annotation purposes, and 
not a command. Once the voice annotation is complete, 
the user releases the button, and the captured voice 
annotation is processed by the capture device and con- 
nected to the current image data file or document data 
file. 

DESCRIPTION OF THE DRAWINGS 

[0020] The above and other aspects, features, and 
advantages of the invention will be better understood by 
reading the following more particular description of the 
invention, presented in conjunction with the following 

drawings, wherein: 

FIG. 1 shows a block diagram of a capture device of 
the present invention; 

FIG. 2 shows a block diagram of a host computer 
system in communication with the capture device of 
the present invention; 

FIG. 3 shows a flow chart of the overall flow of voice 
control input for the operation of the capture device 
of the present invention; 

FIG. 4 shows a flow chart of processing a voice 
control input command by the capture device of the 
present invention; 

FIG. 5 shows a generalized flow chart for executing 
a command by the capture device of the present 
invention; and 
. FIG. 6 shows a flow chart of training the capture 
device of the present invention to recognize voice 
control input commands. 

BEST MODE FOR CARRYING OUT THE INVENTION 

[0021] The following description is of the best pres- 
ently contemplated mode of carrying out the present 
invention. This description is not to be taken in a limiting 
sense but is made merely for the purpose of describing 
the general principles of the invention. The scope of the 
invention should be determined by referencing the 
appended claims. 

[0022] FIG. 1 shows a block diagram of a capture 
device of the present invention. Referring now to FIG. 1 , 
capture device 100 is powered on by pressing a power 
on button, which is one of several control buttons 120 on 
capture device 100. Capture device 100 receives its 



power from internal batteries (not shown in FIG. 1), or 
alternatively through a power cable connected to cap- 
ture device 100 and plugged into a power source (also 
not shown in FIG. 1). Voice control input commands for 
5 controlling capture device 100 are given by a user 
speaking in close enough proximity to be picked up by 
voice pickup component 102. Voice pickup component 
102 converts the user's speech into an analog signal. 
Connected to voice pickup component 102 is an ana- 
10 log-to-digital converter 104. which converts the analog 
signal generated by voice pickup component 102 into a 
digital signal. The digital signal is sent by analog-to-dig- 
ital converter 104 to controller 106, which saves the sig- 
nal in dynamic memory 118, which is connected to 
15 controller 106. Then, in the preferred embodiment of the 
invention, controller 106 calls voice analysis software 
120 stored in static memory 116 to perform a series of 
frequency domain transforms on the digital signal 
stored in dynamic memory 118. Voice analysis software 
20 120 generates a recognition pattern, which is a spectral 
transform, that is compared to recognition patterns (also 
spectral transforms) for commands stored in static 
memory 116 in command recognition table 122. One 
skilled in the art will recognize that any other suitable 
25 meihod for recognizing voice patterns could be used in 
the present invention instead of spectral transforms. 
[0023] If there is a match, then controller 106 
accesses the set of instructions in command recogni- 
tion table 122 linked with the recognition pattern for the 
30 command. For example, after speaking a voice control 
input command to scan a document, the user moves 
capture device 100 such that image pickup component 
1 12 comes in contact with a portion or all of the surface 
of the document. Image pickup component 112 optically 
35 reads sample points from the surface of the document 
and generates a grey scale value for each point sam- 
pled. Controller 106 receives the grey scale values for 
the sample points and assembles them into an image 
array. The result may be output to display 114, which is 
40 connected to controller 106, showing a visual represen- 
tation of the surface of the scanned document. Control- 
ler 1 06 may also convert the grey scale values to binary 
form for display or for storage. The image array, in either 
grey scale or binary form, is passed from controller 106 
45 and stored as a document data file in static memory 
116. 

[0024] After scanning a document, the user may 
speak into voice pickup component 102 to voice anno- 
tate the document data file with a descriptive narrative 

so or other information deemed useful by the user. To dis- 
tinguish a voice annotation, which is a fairly continuous 
stream of voice input over an extended period of time, 
from a voice control input command, which is normally 
just one or two words, in one embodiment of the inven- 

55 tion, the user presses and holds down one of the sev- 
eral control buttons 120 before speaking, sending 
button down input to controller 106, indicating that the 
following stream of voice input is an annotation and not 
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a command. After the user finishes the voice annota- 
tion, the user releases the control button 120, sending 
button up input to controller 1 06, which marks the end of 
the stream of voice input. The stream of voice input that 
was captured is stored as a voice annotation fife in static 
memory 116, and connected to a document data file 
that has been scanned and stored in static memory 
116. 

[0025] In another embodiment of the invention, one 
of the voice control input commands is a voice annota- 
tion command. After issuing the voice control input 
annotation command, the following stream of voice 
input is captured for annotation purposes, and stored as 
a voice annotation file, and connected to an image data 
file or document data file that has been captured and 
stored in the capture device. When the user stops 
speaking for more than a predetermined period of time, 
such as between five to ten seconds, the device inter- 
prets such predetermined period of absence of voice 
input as marking the end of the stream of voice input 
[0026] Upon receiving and recognizing the voice 
control input annotation command, or the pressing and 
holding of one of several control buttons 120 indicating 
that the following voice input is for annotation purposes, 
the voice input from the user is captured by voice pickup 
component 102 and converted to an analog signal. Ana- 
log-to-digital converter 104 converts the analog signal 
generated by voice pickup component 102 into a digital 
signal. The digital signal is sent to controller 106. Con- 
troller 106 stores the voice annotation digital signal as a 
separate voice annotation file in static memory 116 and 
connects the image data file or document data file with 
the voice annotation file. 

[0027] The user may request that document scan- 
ning device 100 play back a voice annotation file. Con- 
troller 106 retrieves the voice annotation file requested 
from static memory 116, passes it to digital-to-analog 
converter 108, which converts the digital signal to an 
analog signal, and passes the analog signal to speaker 
110, which generates audio output. In addition, a partic- 
ular set of instructions for a command may send audible 
output to the user to acknowledge receipt of the com- 
mand utilizing digital-to-analog converter 108 and 
speaker 110, or illuminating an LED (not shown in FIG. 
1). 

[0028] Image data files or document data files and 
the connected voice annotation files may be copied to 
another device, such as host computer system 200 
(FIG. 2) through host connection 122, which is con- 
nected to controller 106. 

[0029] FIG. 2 shows a block diagram of a host com- 
puter system associated with the present invention. 
Referring now to FIG. 2, host computer system 200 con- 
tains a processing element 202. Processing element 
202 communicates to other elements of host computer 
system 200 over a system bus 204. A keyboard 206 
allows a user to input information into host computer 
system 200 and a graphics-display 210 allows host 



computer system 200 to output information to the user. 
A mouse 208 is also used to input information, and a 
storage device 212 is used to store data and programs 
within host computer system 200. Communications 
s interface 214, also connected to system bus 204, 
receives information from capture device 100 (FIG. 1). 
Speaker/sound card 216, connected to system bus 204, 
outputs audio information to the user. Some host com- 
puter systems may not have a sound card, in which 
10 case the speaker is driven only by software. A memory 
218. also attached to system bus 204, contains an oper- 
ating system 220, file transfer software 222, voice anal- 
ysis software 224, user interface program 226, and 
audio file conversion software 228. 
is [0030] File transfer software 222 receives image 
data files or document data files and the connected 
voice annotation files transferred from host connection 
122 (FIG. 1) of capture device 100 through communica- 
tions interface 214 and system bus 204, and saves 
20 them to storage device 212. When the. user accesses 
user interface program 226, and selects an image data 
file or document data file having a voice annotation file, 
audio file conversion software 228 decompresses and 
converts the voice annotation file to an audio file format 
25 recognizable by speaker/sound card 216. 
Speaker/sound card 216 outputs the audio information 
to the user. After hearing the audio information, the user 
may choose to view the image data file or document 
data file. If so, user interface program 226 is suspended, 
30 the application program associated with the image data 
file or document data file is called, and the file is dis- 
played in graphics display 210. 
[0031] In the preferred language independent 
embodiment of the invention, voice analysis software 
35 224, which is also located in capture device 1 00, is used 
by a user to train capture device 100 to recognize the 
user's voice control input commands in any language. 
Capture device 100 is first connected to host computer 
system 200 to take advantage of the greater computing 
40 power. The user then accesses voice analysis software 
224 and selects a particular function, to be represented 
by a word chosen by the user to invoke the function, to 
train, such as the scan function. The user then repeats 
the word chosen by the user to represent the scan func- 
45 tion a number of times. The word most likely to be cho- 
sen by the user is the word, in whatever language the 
user speaks, that is equivalent or closest to the scan 
function. For an English speaking user, the most likely 
word chosen would be the word "scan". The user's rep- 
so etition of the word "scan" is captured by voice pickup 
component 102 (FIG. 1), is processed in capture device 
100 into a signal, and transferred via host connection 
122 to communications interface 214. Communications 
interface 214 transfers each signal via system bus 204 
55 to memory 2 1 8, where voice analysis software 224 ana- 
lyzes each signal. Voice analysis software 224 develops 
a recognition pattern based on each sample signal to 
encompass the variations and inflections in the user's 
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voice in issuing the "scan" command. This process is 
repeated for each of the functions that can be invoked 
with a voice input control command for capture device 
100. There recognition patterns established for all the 
words chosen for training are then downloaded from s 
host computer system 200 to capture device 100, and 
stored in static memory 116 (FIG. 1) in command recog- 
nition table 126 for use in subsequent control opera- 
tions. 

[0032] FIG. 3 shows a flow chart of the overall flow 10 
of operation for voice control input of a capture device. 
Referring now to FIG. 3, in step 300 capture device 100 
(FIG. 1) is powered on. In step 302 a first voice control 
input command, which in the preferred embodiment of 
the invention would be the password, is received by is 
voice pickup component 102 (FIG. 1). Step 304 calls 
FIG. 4 to process the voice control input command. 
Upon returning from FIG. 4, step 306 determines if a 
match was found in the comparison performed in step 
408 from FIG. 4 between the recognition pattern of the 20 
voice control input command received in step 302 and 
any of the recognition patterns stored in command rec- 
ognition table 126. If no match was found, step 310 out- 
puts an indication of no match to the user, which may be 
an audible word or a specrf ic beep pattern. Controi then 25 
returns to step 302 where capture device 100 awaits the 
next voice control input command. 
[0033] If step 306 determines that a match was 
found in step 408 from FIG. 4, then control passes to 
step 308 which calls FIG. 5 to execute the set of instruc- 30 
tions associated with the command. Upon returning 
from FIG. 5, step 312 determines if a next voice control 
input command is received, or if the power is turned off. 
If a next command is received, control returns to step 
302. If the power is turned off, then operation of capture 35 
device 100 ends. 

[0034] FIG. 4 shows a flow chart of processing a 
voice control input command by the capture device of 
the present invention. Referring now to FIG. 4, in step 
400 the voice control input command captured by voice 40 
pickup component 1 02 (FIG. 1 ) is output by voice pickup 
component 102 as an analog signal. In step 402 analog- 
to-digital converter 104 (FIG. 1) receives as input the 
analog signal, converts the analog signal to a digital sig- 
nal, and outputs the digital signal to controller 106 (FIG. 45 
1). In step 404, controller 106 receives as input the dig- 
ital signal and stores the digital signal in dynamic mem- 
ory 118 (FIG. 1). In step 406 controller 106 calls voice 
analysis software 120 to perform frequency domain 
transforms on the digital signal stored in step 404, ere- so 
ating a recognition pattern. In step 408, controller 106 
compares the recognition pattern from step 406 with the 
recognition patterns for voice control input commands 
stored in command recognition table 126 held in static 
memory 1 1 6 (FIG. 1). Control then returns to FIG. 3. ss 
[0035] FIG. 5 shows a generalized flow chart for 
executing a command by the capture device of the 
present invention. One skilled in the art will recognize 



that the order of the steps may vary greatly depending 
upon the desired operation associated with a specific 
command. Referring now to FIG. 5, step 500 accesses 
the set of instructions linked to the recognition pattern 
matching the voice control input command received in 
step 302. Step 502 determines if the set of instructions 
begins with a time delay instruction. If the answer is yes, 
then in step 504 the time delay instruction is executed, 
suspending further execution of the remaining instruc- 
tions in the set of instructions until the amount of time 
specified in the time delay has elapsed. After the time 
delay of step 504, or if step 502 determined there was 
no time delay instruction, control passes to step 506. 
[0036] Step 506 determines if the next instruction in 
the set of instructions requires an output of acknowledg- 
ment of the command. If the answer is yes, then in step 
508 the acknowledgment instruction is executed. 
Depending on the particular command, the acknowl- 
edgment may be made in the form of an audible beep, a 
voice playback of the voice control input command 
received, illuminating an LED, or any other appropriate 
means. After the acknowledgment instruction is exe- 
cuted in step 508, or if step 506 determined there was 
no acknowledgment instruction, control passes to step 
510. 

[0037] Step 51 0 determines if the next instruction in 
the set of instructions requires confirmation input by the 
user before further execution of the remaining instruc- 
tions. Certain commands, such as the delete command, 
may require confirmation as a safety precaution to help 
prevent the inadvertent destruction of valuable data. If 
the answer in step 510 is yes, then step 512 determines 
if the proper continuation input is received from the user. 
Based on the particular command, the confirmation 
may require the user to press one of the several control 
buttons 120. Or, the user may have to issue another 
voice control input command as confirmation. If the 
proper continuation input is not received, or no input at 
all is received in step 512, controi returns to step 302 in 
FIG. 3 to await the next voice control input command. If 
the proper confirmation input is received in step 512, or 
if step 510 determined there was no confirmation 
instruction, control passes to step 514. 
[0038] Step 514 determines if input to cancel the 
command is received. If cancel input is received in step 
514, then control returns to step 302 in FIG. 3 to await 
the next voice control input command. If no cancel input 
is received in step 514, then controi passes to step 516 
which executes the remaining instructions in the set of 
instructions for the command. Upon executing the last 
instruction in the set of instructions, control returns to 
step 312 in FIG. 3. 

[0039] FIG. 6 shows a flow chart of training the cap- 
ture device of the present invention to recognize user 
voice control input commands. Referring now to FIG. 6, 
in step 600 voice analysis software 224 is loaded into 
memory 218 in host computer system 200 (FIG. 2). 
Capture device 100 is powered on in step 602. In step 
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604 capture device 100 (FIG. 1) is connected to host 
computer system 200. This could be through a cable, an 
infra-red beam, or any other suitable connection. In step 
606, input from a user is received in voice analysis soft- 
ware 224 selecting a first function for training and voice 
analysis of the command word for invoking the function. 
Voice analysis software 224 then prompts the user in 
step 608 to audibly repeat the command word the user 
has chosen to invoke the f irst function into voice pickup 
component 102 (FIG. 1) of capture device 100 a multi- 
ple number of times. In step 610, the multiple voice 
inputs of the command word captured by voice pickup 
component 102 are processed by capture device 100 
into digital signals and sent to voice analysis software 
224 in host computer system 200. The voice analysis 
software 224 in step 612 analyzes the multiple digital 
signals received in step 610 and develops a recognition 
pattern for the command word. The recognition pattern 
of step 612 is stored in memory 218 in step 614. 
[0040] Step 616 determines if the user has selected 
a next function for training and voice analysis of the 
command word for invoking the next function, or if an 
indication is received that the user is done selecting 
functions for training and voice analysis. If a next func- 
tion has been selected in step 616, control returns to 
step 606. If an indication is received that the user is 
done selecting functions, then in step 618 voice analy- 
sis software 224 transfers all recognition patterns deter- 
mined instep 612 and stored in step 614 to capture 
device 100 over the connection established in step 602. 
In step 620 the recognition patterns transferred in step 
61 8 are stored in static memory 1 16 in command recog- 
nition table 126, such that the recognition pattern for 
each function is linked to the set of instructions, also 
stored in command recognition table 126, that will be 
executed upon receiving the voice control input com- 
mand that, when processed into a recognition pattern, 
matches one of the recognition patterns determined in 
step 612. After step 620 training and voice analysis of 
command words for capture device 100 ends. 
[0041 1 Having thus described a presently preferred 
embodiment of the present invention, it will be under- 
stood by those skilled in the art that many changes in 
construction and circuitry and widely differing embodi- 
ments and applications of the invention will suggest 
themselves without departing from the scope of the 
present invention as defined in the claims. The disclo- 
sures and the description herein are intended to be 
illustrative and are not in any sense limiting of the inven- 
tion, defined in scope by the following claims. 

Claims 

1 . A voice control input method for a capture device 
(100), said method comprising the steps of: 

(a) capturing (302) a first voice control input 
command with a voice pickup component (102) 



in said capture device (100); 

(b) converting (400) said first voice control 
input command into a first analog signal; 

(c) converting (402) said first analog signal into 
5 a first digital signal; 

(d) converting (406) said first digital signal into 
a first recognition pattern; 

(e) comparing (408) said first recognition pat- 
tern to at least one recognition pattern stored in 

10 a command recognition table (126) in a static 

memory (1 1 6) in said capture device (100); and 

(f) when said first recognition pattern matches 
(306) said at least one recognition pattern 
stored in said command recognition table 

is (126), executing (308) a first set of instructions 

linked to said at least one recognition pattern. 

2. The voice control input method for a capture device 
(100) according to claim 1 wherein said capture 

20 device (1 00) is a scanner device. 

3. The voice control input method for a capture device 
(100) according to claim 1 wherein said capture 
device (100) is a digital camera. 

25 

4. The voice control input method for a capture device 
(100) according to claim 1 wherein step (b) further 
comprises the step (b1), and step (c) further com- 
prises the steps (d ) through (c3) : 

30 

(b1) inputting, to an analog-to-digital converter 
(104) in said capture device (100), said first 
analog signal; 

(d) converting said first analog signal, in said 
35 analog-to-digital converter (104), to said first 

digital signal; 

(c2) transferring said first digital signal from 
said anatog-to-digital converter (104) to a con- 
troller (106) in said capture device (100); and 
40 (c3) storing (404), by said controller (106), said 

first digital signal in a dynamic memory (1 1 8) in 
said capture device (1 00). 

5. Tiie voice control input method for a capture device 
45 (100) according to claim 1 wherein step, (d) further 

comprises the step (d1): 

(d1) performing (406) a plurality of frequency 
domain transforms on said first digital signal 
so stored in a dynamic memory (1 1 8) in said cap- 

ture device (100), generating said first recogni- 
tion pattern, wherein said first recognition 
pattern is a spectral transform of said first dig- 
ital signal. 

55 

6. The voice control input method for a capture device 
(100) according to claim 1 wherein step (a) further 
comprises the steps (aOa) through (aOj) performed 
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before step (a): 

(aOa) loading (600) voice analysis software 
(224) into a memory (218) in a host computer 
system (200); 5 
(aOb) connecting (604) said capture device 
(100) to said host computer system (200); 
(aOc) selecting (606) a predetermined function, 
with said voice analysis software (224), for 
training and voice analysis of at least one word 10 
for invoking said predetermined function; 
(aOd) capturing (610) a plurality of voice inputs 
of said at least one word in said voice pickup 
component (102) of said capture device (100); 
(aOe) processing (610) said plurality of voice 75 
inputs into a plurality of digital signals in said 
capture device (100); 

(aOf) sending (610) said plurality of digital sig- 
nals from said capture device (100) to said host 
computer system (200); 20 
(aOg) analyzing (612) said plurality of digital 
signals with said voice analysis software (224); 
(aOh) developing (612) said at least one recog- 
nition pattern from said analysis of said plural- 
ity of digital signals with said voice analysis 25 
software (224); 

(aOi) storing (614) said at least one recognition 
pattern in said memory (218) in said host com- 
puter system (200); 

(aOj) transferring (61 8) said at least one recog- 30 
nition pattern in said memory (2 1 8) in said host 
computer system (200) to said command rec- 
ognition table (1 26) in said static memory (1 1 6) 
in said capture device (100), wherein said at 
least one recognition pattern is linked to said 35 
first set of instructions stored in said command 
recognition table (126) for performing said pre- 
determined function; and 
(aOk) repeating steps (aOc) through (aOj) for a 
plurality of predetermined functions, wherein a 40 
plurality of recognition patterns are developed 
from a plurality of said plurality of voice inputs 
for a plurality of said at least one words, and 
further wherein said plurality of recognition pat- 
terns are stored in said command recognition 45 
table (126) in said static memory (116) in said 
capture device (100), wherein each of said plu- 
rality of recognition patterns are linked to one of 
a plurality of predetermined sets of instructions 
stored in said command recognition table (1 26) so 
for performing one of said plurality of predeter- 
mined functions. 

The voice control input method for a capture device 
(100) according to claim 6 wherein step (aOa) is 55 
replaced by the new step (aOa), steps (aOb), (aOf), 
and (aOi) are eliminated, and step (aOj) is replaced 
by the new step (aOj); 



(aOa) accessing voice analysis software (124) 
in said static memory (116) in said capture 
device (100); and 

(aOj) storing said at least one recognition pat- 
tern in said command recognition table (126) in 
said static memory (1 16) in said capture device 
(100), wherein said at least one recognition 
pattern is linked to said first set of instructions 
stored in said command recognition table (126) 
for performing said predetermined function. 

8. The voice control input method for a capture device 
(100) according to claim 6 or claim 7 wherein said 
at least one word is language independent. 

9. The voice control input method for a capture device 
(100) according to claim 1 wherein said first recog- 
nition pattern, representing a voice annotation com- 
mand, matches said at least one recognition 
pattern stored in said command recognition table 
(126), and further wherein said first set of instruc- 
tions executed in step (f) further comprises the fol- 
lowing steps (f 1) through (f8): 

(f1) until a predetermined period of absence of 
voice input has occuned, performing steps (f2) 
through (f6); 

(f2) capturing a stream of voice input with said 
voice pickup component (102) in said capture 
device (100); 

(f3) converting said stream of voice input into a 
second analog signal; 

(f4) inputting, to an analog-to -digital converter 
(104) in said capture device (100), said second 
■ analog signal; 
(f5) converting said second analog signal, in 
said analog-to-digital converter (104), to a sec- 
ond digital signal; 

(f6) transferring said second digital signal from 
said analog-to-digital converter (104) to a con- 
troller (106) in said capture device (100); 
(f7) storing, by said controller (106), said sec- 
ond digital signal in said static memory (1 16) in 
said capture device (1 00) as a voice annotation 
file; and 

(f8) connecting said voice annotation file to a 
data file stored in said static memory (1 16). 

10. The voice control input method for a capture device 
(100) according to claim 1 further comprising the 
steps of: 

(g) receiving button down input in a controller 
(106) from an annotation control button on said 
capture device (100); 

(h) until button up input is received in said con- 
troller (106) from said annotation control but- 
ton, performing steps (i) through (m); 
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(i) capturing a stream of voice input with said 
voice pickup component (102); 
(j) converting said stream of voice input into a 
second analog signal ; 

(k) inputting, to an analog -to-digital converter s 
(104) in said capture device (100), said second 
analog signal; 

(I) converting said second analog signal, in said 
analog-to-digital converter (104), to a second 
digital signal; 10 
(m) transferring said second digital signal from 
said analog -to-digital oonverter (104) to a con- 
troller (106) in said capture device (100); 
(n) receiving said button up input in said con- 
troller (106) from said annotation control button is 
on said capture device (100); 
(o) storing, by said controller (106), said sec- 
ond digital signal in said static memory (1 1 6) in 
said capture device (100) as a voice annotation 
file; and 20 
(p) connecting said voice annotation file to a 
data file stored in said static memory (116). 
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